sample number
min
LetAbean nHermitian matrixandletBbea(n 1) (n 1)matrixwhich is constructed by deleting thei-th row andi-th column ofA. Denote thatΦ = [ϕ(x1),...,ϕ(xn)] Rn D, where D is the dimension of feature spaceH. Performing rank-n singular value decomposition (SVD) onΦ, we have Φ = HΣV, where H Rn n, Σ Rn n is a diagonal matrix whose diagonal elements are the singular values of Φ,andV RD n. F(α) in Eq.(21) is proven differentiable and thep-th component of the gradient is F(α) αp = Then, a reduced gradient descent algorithm [26] is adopted to optimize Eq.(21). The three deep neural networks are pre-trained on the ImageNet[5].
- North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Asia > India > Karnataka > Bengaluru (0.04)
A Proofs A.1 Proof of Proposition 4.1 Proof
The first lemma is Lemma 3 in [24]. Hermitian matrix and let B be a Hermitian perturbation. To apply Lemma A.1, we must study the relationship between minimum eigenvalue gap of By Lemma A.2, we have (p 1) (p 1) ( p 1) Then, by the proof of Theorem 5.2, we have null null null l null x, y, null H (p 1) (p 1) ( p 1) (p 1) ( p 1) (p 1) (p 1) (p 1) ( p 1) ( p 1) ( p 1) (p 1) (p 1) (p 1) (p 1) (p 1) A.5 The Optimization of SimpleMKKM SimpleMKKM aims to solve the following kernel alignment-based optimization problem: min Assume that the number of iterations is T . Table 4: Large-scale datasets used in the experiments Dataset Samples View Clusters NUSWIDE 30000 5 31 A wA 30475 6 50 MNIST 60000 3 10 YtVideo 101499 5 31 B.2 Clustering Performance with Different Numbers of Landmarks As seen, as the number of landmarks increases, the ACC of the proposed method is approaching SimpleMKKM, and tends to be stable. It shows that we don't need too many landmarks To verify the assumptions about the eigenvalues of the empirical kernel matrix in Theorem 5.2, we To give more empirical studies of the proposed method, we conduct additional experiments on three classic algorithms, i.e., average multiple kernel The results are reported in the following three tables.
- North America > United States > New York (0.04)
- Asia > Taiwan (0.04)
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
DUOL: A Double Updating Approach for Online Learning
Peilin Zhao, Steven C. Hoi, Rong Jin
In most online learning algorithms, the weights assigned to the misclassified examples (or support vectors) remain unchanged during the entire learning process. This is clearly insufficient since when a new misclassified example is added to the pool of support vectors, we generally expect it to affect the weights for the existing support vectors. In this paper, we propose a new online learning method, termed Double Updating Online Learning, or DUOL for short. Instead of only assigning a fixed weight to the misclassified example received in current trial, the proposed online learning algorithm also tries to update the weight for one of the existing support vectors. We show that the mistake bound can be significantly improved by the proposed online learning method. Encouraging experimental results show that the proposed technique is in general considerably more effective than the state-of-the-art online learning algorithms.
- Asia > Singapore (0.04)
- North America > United States > Michigan > Ingham County > Lansing (0.04)
- North America > United States > Michigan > Ingham County > East Lansing (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Detecting Backdoor Samples in Contrastive Language Image Pretraining
Huang, Hanxun, Erfani, Sarah, Li, Yige, Ma, Xingjun, Bailey, James
Contrastive language-image pretraining (CLIP) has been found to be vulnerable to poisoning backdoor attacks where the adversary can achieve an almost perfect attack success rate on CLIP models by poisoning only 0.01\% of the training dataset. This raises security concerns on the current practice of pretraining large-scale models on unscrutinized web data using CLIP. In this work, we analyze the representations of backdoor-poisoned samples learned by CLIP models and find that they exhibit unique characteristics in their local subspace, i.e., their local neighborhoods are far more sparse than that of clean samples. Based on this finding, we conduct a systematic study on detecting CLIP backdoor attacks and show that these attacks can be easily and efficiently detected by traditional density ratio-based local outlier detectors, whereas existing backdoor sample detection methods fail. Our experiments also reveal that an unintentional backdoor already exists in the original CC3M dataset and has been trained into a popular open-source model released by OpenCLIP. Based on our detector, one can clean up a million-scale web dataset (e.g., CC3M) efficiently within 15 minutes using 4 Nvidia A100 GPUs. The code is publicly available in our \href{https://github.com/HanxunH/Detect-CLIP-Backdoor-Samples}{GitHub repository}.
Bias Detection via Maximum Subgroup Discrepancy
Němeček, Jiří, Kozdoba, Mark, Kryvoviaz, Illia, Pevný, Tomáš, Mareček, Jakub
Bias evaluation is fundamental to trustworthy AI, both in terms of checking data quality and in terms of checking the outputs of AI systems. In testing data quality, for example, one may study a distance of a given dataset, viewed as a distribution, to a given ground-truth reference dataset. However, classical metrics, such as the Total Variation and the Wasserstein distances, are known to have high sample complexities and, therefore, may fail to provide meaningful distinction in many practical scenarios. In this paper, we propose a new notion of distance, the Maximum Subgroup Discrepancy (MSD). In this metric, two distributions are close if, roughly, discrepancies are low for all feature subgroups. While the number of subgroups may be exponential, we show that the sample complexity is linear in the number of features, thus making it feasible for practical applications. Moreover, we provide a practical algorithm for the evaluation of the distance, based on Mixed-integer optimization (MIO). We also note that the proposed distance is easily interpretable, thus providing clearer paths to fixing the biases once they have been identified. It also provides guarantees for all subgroups. Finally, we empirically evaluate, compare with other metrics, and demonstrate the above properties of MSD on real-world datasets.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Mississippi (0.05)
- North America > United States > Maine (0.05)
- (11 more...)
- Law (1.00)
- Government > Regional Government > North America Government > United States Government (0.67)
Reviving The Classics: Active Reward Modeling in Large Language Model Alignment
Shen, Yunyi, Sun, Hao, Ton, Jean-François
Building neural reward models from human preferences is a pivotal component in reinforcement learning from human feedback (RLHF) and large language model alignment research. Given the scarcity and high cost of human annotation, how to select the most informative pairs to annotate is an essential yet challenging open problem. In this work, we highlight the insight that an ideal comparison dataset for reward modeling should balance exploration of the representation space and make informative comparisons between pairs with moderate reward differences. Technically, challenges arise in quantifying the two objectives and efficiently prioritizing the comparisons to be annotated. To address this, we propose the Fisher information-based selection strategies, adapt theories from the classical experimental design literature, and apply them to the final linear layer of the deep neural network-based reward modeling tasks. Empirically, our method demonstrates remarkable performance, high computational efficiency, and stability compared to other selection methods from deep learning and classical statistical literature across multiple open-source LLMs and datasets. Further ablation studies reveal that incorporating cross-prompt comparisons in active reward modeling significantly enhances labeling efficiency, shedding light on the potential for improved annotation strategies in RLHF.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
How Does the Spatial Distribution of Pre-training Data Affect Geospatial Foundation Models?
Purohit, Mirali, Muhawenayo, Gedeon, Rolf, Esther, Kerner, Hannah
Foundation models have made rapid advances in many domains including Earth observation, where Geospatial Foundation Models (GFMs) can help address global challenges such as climate change, agriculture, and disaster response. Previous work on GFMs focused on tailoring model architecture and pre-text tasks, and did not investigate the impact of pre-training data selection on model performance. However, recent works from other domains show that the pre-training data distribution is an important factor influencing the performance of the foundation models. With this motivation, our research explores how the geographic distribution of pre-training data affects the performance of GFMs. We evaluated several pre-training data distributions by sampling different compositions from a global data pool. Our experiments with two GFMs on downstream tasks indicate that balanced and globally representative data compositions often outperform region-specific sampling, highlighting the importance of diversity and global coverage in pre-training data. Our results suggest that the most appropriate data sampling technique may depend on the specific GFM architecture. These findings will support the development of robust GFMs by incorporating quality pre-training data distributions, ultimately improving machine learning solutions for Earth observation.
- South America (0.04)
- Oceania (0.04)
- Europe (0.04)
- (7 more...)
ML$^2$Tuner: Efficient Code Tuning via Multi-Level Machine Learning Models
Cha, JooHyoung, Lee, Munyoung, Kwon, Jinse, Lee, Jubin, Lee, Jemin, Kwon, Yongin
The increasing complexity of deep learning models necessitates specialized hardware and software optimizations, particularly for deep learning accelerators. Existing autotuning methods often suffer from prolonged tuning times due to profiling invalid configurations, which can cause runtime errors. We introduce ML$^2$Tuner, a multi-level machine learning tuning technique that enhances autotuning efficiency by incorporating a validity prediction model to filter out invalid configurations and an advanced performance prediction model utilizing hidden features from the compilation process. Experimental results on an extended VTA accelerator demonstrate that ML$^2$Tuner achieves equivalent performance improvements using only 12.3% of the samples required with a similar approach as TVM and reduces invalid profiling attempts by an average of 60.8%, Highlighting its potential to enhance autotuning performance by filtering out invalid configurations